Read Structure File: Read secondary structure file

Description

Reads in secondary structure text files into a helix data frame.

Usage

readHelix(file) readConnect(file) readVienna(file, palette = NA) readBpseq(file)

Arguments

file

A text file in connect format, see details for format specifications.

palette

Used to colour basepairs by bracket type. See viennaToHelix for more details.

Value

Returns a helix format data frame.

Details

Helix: Files start with a header line beginning with # followed by the sequence length, followed by a four-column tab-delimited table (with column names), where each row corresponds to a helix in the structure. The four columns are i and j for the left-most and right-most basepair positions respectively, the length of the helix (converging inwards from i and j, and finally an arbitrary value assigned to the helix. Vienna: Dot-bracket notation from Vienna package programs, where each structure consists of matched brackets for basepairs and periods for unbased pairs. Valid brackets are (, , [, <, a,="" b,="" c,="" d="" matched="" with="" ),="" ,="" ],="">, a, b, c, d, respectively. An energy value can be appended to the end of any dot-bracket structure. The function will accept slight variations of the format, including those with FASTA-like headers (in which case line breaks are allows), and those without FASTA-like headers (in which case line breaks are NOT allowed), with both types allowing for a preceding (NOT following) nucleotide sequence for the structure. Multiple entries of the same length may be in a single file, which will be returned as a single helix structure, with respectively energy values (if specified). Connect: Output from mfold and other programs, this format is expected to be a text file beginning with a header line that starts with the sequence length, with an optional Energy/dG value, followed by a six-column tab-delimited table where columns 1 and 5 denote the position that are basepaired (unpaired when column 5 is 0). Other columns are ignored, but for completeness, column 2 is the nucleotide, column 3 and 4 are the positions of the bases left and right of the base specified in column 1 respectively (with 0 denoting non-existance), and column 6 a copy of column 1. Multiple entries of the same length may be in a single file, which will be returned as a single helix structure. All helices will be assigned the energy value extracted from their respective structure header lines. Bpseq: Format used by the Gutell Lab's Comparative RNA Website. The file may optionally begin several header lines (e.g. Filename, Organism, Accession, etc.), followed by a 3-column tab-delimited table for the structure, where column 1 is the base position, base 2 is the nucleotide base, and column 3 is the paired position (0 if unpaired). Certain pieces of header information will be parsed and returned as attributes of the output data frame. Multiple structures can be within a single file, returned as a single helix data frame, with attributes set to those of the first entry.

Examples

Run this code

    file <- system.file("extdata", "helix.txt", package = "R4RNA")
    helix <- readHelix(file)
    head(helix)

    file <- system.file("extdata", "connect.txt", package = "R4RNA")
    connect <- readConnect(file)
    head(connect)
    message("Note connect data assigns structure energy level to all basepairs")

    file <- system.file("extdata", "vienna.txt", package = "R4RNA")
    vienna <- readVienna(file)
    head(vienna)
    message("Note vienna data assigns structure energy level to all basepairs")

    file <- system.file("extdata", "bpseq.txt", package = "R4RNA")
    bpseq <- readBpseq(file)
    head(bpseq)
    message("Note bpseq data has no value assigned to basepairs")

Run the code above in your browser using DataLab